pacman::p_load(arrow, here, lubridate, maptools, tidyverse, tmap, sf, sp, spatstat, spNetwork)Take-home Exercise 1: Application of Spatial Point Patterns Analysis to discover the geographical distribution of Grab hailing services in Singapore
1.0 Introduction
1.1. Overview - Setting the Scene
Human mobility, the movement of human beings in space and time, reflects the spatial-temporal characteristics of human behavior. With the advancement Information and Communication Technologies (ICT) especially smart phone, a large volume of data related to human mobility have been collected. By using appropriate GIS analysis methods, these data are potentially useful in supporting smart city planning and management.
In Singapore, one of the important source of data related to human mobility is from Land Transport Authority (LTA) DataMall. Two data sets related to human mobility are provided by the portal, they are: Passenger Volume by Origin Destination Train Stations and Passenger Volume by Origin Destination Bus Stops. One of the limitation of these data sets is that their location are biased to either bus stops or MRT/LRT stations. In 2020, another very interesting human mobility data set called Grab Posisi was released by GRAB, one of the largest shared taxi operator in South-east Asia. There are two data sets been released and one of them is for Singapore.
1.2 Objectives
Geospatial analytics hold tremendous potential to address complex problems facing society.
In this study, we will be working on the following tasks:
Apply appropriate spatial point patterns analysis methods
Discover the geographical and spatial-temporal distribution of Grab hailing services locations in Singapore
1.3 Getting Started
In this take-home exercise, we will be using the following packages:
arrow
sf for handling geospatial data
tidyverse for performing data science tasks such as importing, wrangling and visualising data
2.0 Data Acquisition
There will be 3 data sets used in this exercise:
| Data | Format | Description | Source |
|---|---|---|---|
| Grab Posisi | Parquet | Grab taxi location points either by origins or destinations | Grab |
| Road | Shapefile | Road layer within Singapore excluding outer islands | Geofabrik download server |
| Master Plan 2019 Subzone Boundary (No Sea) | GeoJSON | Singapore boundary layer excluding outer islands | Data.gov.sg |
2.1 Extracting Geospatial and Aspatial Data Sets
Start by creating a new folder labeled Take-home_Ex01. Within this folder, create a sub-folder named data. Inside the data sub-folder, create two additional sub-folders and rename them geospatial and aspatial respectively.
Unzip the malaysia-singapore-brunei-latest-free.shp.zip folder and place all files, MasterPlan2019SubzoneBoundaryNoSeaGEOJSON.geojson into geospatial sub-folder.
Place all files from GrabPosisi into aspatial sub-folder.
Did you observe that the file names from GrabPosisi and MasterPlan2019SubzoneBoundaryNoSeaGEOJSON.geojson are quite lengthy? Shortening them could make processing more convenient later on.
Alternatively, we can list.files() to get a list of filenames that contains .parquet extension.
3.0 Geospatial Data Wrangling
3.1 Data Aggregation: Importing and Combining Aspatial parquet files
# Use list.files to get a list of filenames that match the pattern
parquet_files <- list.files(path = "data/aspatial", pattern = "\\.parquet$", full.names = TRUE)
grab_data <- data.frame()
for (file_path in parquet_files) {
grab_data <- bind_rows(grab_data, read_parquet(file_path))
}3.2 Data Export: Writing DataFrame to RDS file
write_rds(grab_data, "data/rds/grab_data.rds")3.3 Data Import
3.3.1 Importing Aspatial Data - Grab Posisi data in RDS format
grab_df <- read_rds("data/rds/grab_data.rds")str(grab_df)'data.frame': 30329685 obs. of 9 variables:
$ trj_id : chr "70014" "73573" "75567" "1410" ...
$ driving_mode : chr "car" "car" "car" "car" ...
$ osname : chr "android" "android" "android" "android" ...
$ pingtimestamp: int 1554943236 1555582623 1555141026 1555731693 1555584497 1555395258 1554768955 1554783532 1554898418 1555593189 ...
$ rawlat : num 1.34 1.32 1.33 1.26 1.28 ...
$ rawlng : num 104 104 104 104 104 ...
$ speed : num 18.9 17.7 14 13 14.8 ...
$ bearing : int 248 44 34 181 93 73 82 321 324 31 ...
$ accuracy : num 3.9 4 3.9 4 3.9 ...
What we can observe is that the grab data contains 30329685 observations and 9 variables. Notice that pingtimestamp is in the wrong format. It should be in date/time format and not integer. We will need to convert the data type in the next section (Data Preparation).
3.3.2 Importing Geospatial Data - Road data in shapefile format
We can import geospatial data into RStudio using st_read() of sf package. Let’s try it now!
roads_sf <- st_read(dsn="data/geospatial",
layer="gis_osm_roads_free_1")Reading layer `gis_osm_roads_free_1' from data source
`C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 1759836 features and 10 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 99.66041 ymin: 0.8021131 xmax: 119.2601 ymax: 7.514393
Geodetic CRS: WGS 84
str(roads_sf)Classes 'sf' and 'data.frame': 1759836 obs. of 11 variables:
$ osm_id : chr "4386520" "4578273" "4579495" "4579533" ...
$ code : int 5113 5114 5122 5122 5122 5122 5141 5122 5122 5122 ...
$ fclass : chr "primary" "secondary" "residential" "residential" ...
$ name : chr "Orchard Road" "Jalan Bukit Bintang" "Jalan Nagasari" "Persiaran Raja Chulan" ...
$ ref : chr NA NA NA NA ...
$ oneway : chr "F" "F" "B" "B" ...
$ maxspeed: int 50 0 0 0 0 0 0 0 0 0 ...
$ layer : num 0 0 0 0 0 0 -1 0 0 0 ...
$ bridge : chr "F" "F" "F" "F" ...
$ tunnel : chr "F" "F" "F" "F" ...
$ geometry:sfc_LINESTRING of length 1759836; first list element: 'XY' num [1:2, 1:2] 103.83 103.83 1.31 1.31
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
..- attr(*, "names")= chr [1:10] "osm_id" "code" "fclass" "name" ...
3.3.3 Importing Geospatial Data - Master Plan 2019 Subzone Boundary (No Sea) in shapefile format
mpsz_sf <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019")Reading layer `MPSZ-2019' from data source
`C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
str(mpsz_sf)Classes 'sf' and 'data.frame': 332 obs. of 7 variables:
$ SUBZONE_N : chr "MARINA EAST" "INSTITUTION HILL" "ROBERTSON QUAY" "JURONG ISLAND AND BUKOM" ...
$ SUBZONE_C : chr "MESZ01" "RVSZ05" "SRSZ01" "WISZ01" ...
$ PLN_AREA_N: chr "MARINA EAST" "RIVER VALLEY" "SINGAPORE RIVER" "WESTERN ISLANDS" ...
$ PLN_AREA_C: chr "ME" "RV" "SR" "WI" ...
$ REGION_N : chr "CENTRAL REGION" "CENTRAL REGION" "CENTRAL REGION" "WEST REGION" ...
$ REGION_C : chr "CR" "CR" "CR" "WR" ...
$ geometry :sfc_MULTIPOLYGON of length 332; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:300, 1:2] 104 104 104 104 104 ...
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA
..- attr(*, "names")= chr [1:6] "SUBZONE_N" "SUBZONE_C" "PLN_AREA_N" "PLN_AREA_C" ...
We can observe that both roads_sf and mpsz_sf are currently using the WGS 84 geographic coordinate system.
After looking at the file structure and contents of the 3 datasets, we need to perform the following preparations:
grab_dfConverting data type
Extracting trip starting location
Converting aspatial data into geospatial data
roads_sfandmpsz_sfPerforming projection transformation
Extracting study area
Getting road layer within Singapore excluding outer islands
3.4 Data Preparation
grab_df
3.4.1 Preparing and Converting grab_df into grab_sf (sf tibble data.framedata)
# Converting data type using as_datetime()
grab_df$pingtimestamp <- as_datetime(grab_df$pingtimestamp)
# Checking first n rows of data frame using head()
head(grab_df)
grab_origins_sf <- grab_df %>%
group_by(trj_id) %>%
arrange(pingtimestamp) %>%
filter(row_number()==1) %>%
mutate(weekday = wday(pingtimestamp,
label=TRUE,
abbr=TRUE),
start_hr = hour(pingtimestamp),
day = mday(pingtimestamp)) %>%
st_as_sf(coords = c("rawlng", "rawlat"),
crs = 4326) %>%
st_transform(crs = 3414)
# Getting geometry details using st_geometry()
st_geometry(grab_origins_sf)- 1
-
Converting data type for
pingtimestampfrom integer into datetime using as_datetime() from lubridate package - 2
- By default, the head() returns the first 6 rows
- 3
-
Converting
grab_dfinto sf tibble data.frame using st_as_sf() and its location information - 4
- Transforming coordinates using st_transform()
trj_id driving_mode osname pingtimestamp rawlat rawlng speed
1 70014 car android 2019-04-11 00:40:36 1.342326 103.8890 18.91000
2 73573 car android 2019-04-18 10:17:03 1.321781 103.8564 17.71908
3 75567 car android 2019-04-13 07:37:06 1.327088 103.8613 14.02155
4 1410 car android 2019-04-20 03:41:33 1.262482 103.8238 13.02652
5 4354 car android 2019-04-18 10:48:17 1.283799 103.8072 14.81294
6 32630 car android 2019-04-16 06:14:18 1.300330 103.9062 23.23818
bearing accuracy
1 248 3.9
2 44 4.0
3 34 3.9
4 181 4.0
5 93 3.9
6 73 3.9
Geometry set for 28000 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3628.243 ymin: 25198.14 xmax: 49845.23 ymax: 49689.64
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
3.4.2 Creating ppp objects
grab_origins_ppp <- grab_origins_sf %>%
as('Spatial') %>%
as('SpatialPoints') %>%
as('ppp')
summary(grab_origins_ppp)Planar point pattern: 28000 points
Average intensity 2.473666e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
(46220 x 24490 units)
Window area = 1131920000 square units
# Checking for duplicated points
any(duplicated(grab_origins_ppp))[1] FALSE
plot(grab_origins_ppp)
3.4.3 Performing projection transformation
To perform projection transformation, we will st_transform(). Additionally, we will use st_geometry() to inspect the contents of roads_sf and mpsz_sf after the projection transformation.
roads_sf
# Transforming coordinates using st_transform()
roads_sf <- st_transform(roads_sf,
crs = 3414)
# Getting geometry details using st_geometry()
st_geometry(roads_sf)Geometry set for 1759836 features
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -434085.6 ymin: -23036.54 xmax: 1759022 ymax: 741993.8
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
mpsz_sf
# Transforming coordinates using st_transform()
mpsz_sf <- st_transform(mpsz_sf,
crs = 3414)
# Getting geometry details using st_geometry()
st_geometry(mpsz_sf)Geometry set for 332 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
Now, we observe that both roads_sf and mpsz_sf are using the SVY21 projected coordinates system.
3.4.4 Extracting specific planning area - Downtown Core
Due to the intensive computational power required to conduct analysis on the whole of Singapore, we will be focusing on the Downtown Core planning area for this analysis as we want to investigate the spatial data point analysis in the Central Business District areas.
sg_downtown_sf <- mpsz_sf %>%
filter(PLN_AREA_N == "DOWNTOWN CORE")plot(sg_downtown_sf["PLN_AREA_N"], main = "DOWNTOWN CORE")
#sg_sf <- mpsz_sf %>%
# st_union()3.4.5 Creating owin object - Downtown Core
downtown_owin <- as.owin(sg_downtown_sf)
summary(downtown_owin)Window: polygonal boundary
single connected closed polygon with 637 vertices
enclosing rectangle: [28896.26, 31980.09] x [27914.19, 31855.48] units
(3084 x 3941 units)
Window area = 5083640 square units
Fraction of frame area: 0.418
plot(downtown_owin)
3.4.6 Combining grab data points with study area - Downtown Core
summary(downtown_owin)Window: polygonal boundary
single connected closed polygon with 637 vertices
enclosing rectangle: [28896.26, 31980.09] x [27914.19, 31855.48] units
(3084 x 3941 units)
Window area = 5083640 square units
Fraction of frame area: 0.418
grab_origins_dt_ppp <- grab_origins_ppp[downtown_owin]
plot(grab_origins_dt_ppp, main="DOWNTOWN CORE")
3.4.7 Getting grab points within Downtown Core
sg_downtown_grab_sf <- st_intersection(grab_origins_sf, sg_downtown_sf)3.4.8 Getting road layer within Singapore excluding outer islands
In order to get the road layer within Singapore (excluding outer islands), we can use st_intersection() from sf package to confine the analysis to Singapore’s boundary (in specifics, confining to the Downtown core planning area).
sg_downtown_road_sf <- st_intersection(roads_sf, sg_downtown_sf)
sg_downtown_road_sf <- st_cast(sg_downtown_road_sf, "LINESTRING")3.4.9 Visualizing the Geospatial data
tmap_mode('view')
tm_shape(sg_downtown_grab_sf) +
tm_dots()tmap_mode('plot')4.0 1st Order Spatial Point Pattern Analysis (SPPA)
4.1 Kernel Density Estimation (KDE)
kde_grab_origins_dt_bw <- density(grab_origins_dt_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_grab_origins_dt_bw, main = "KDE Grab Origins for Downtown Core")
grab_origins_dt_ppp.km <- rescale(grab_origins_dt_ppp, 1000, "km")kde_grab_origins_dt_bw <- density(grab_origins_dt_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_grab_origins_dt_bw, main = "KDE Grab Origins for Downtown Core")
Working with different automatic bandwidth methods and kernels
par(mfrow=c(2,4))
# Using bw.diggle() bw
plot(density(grab_origins_dt_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Gaussian")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="epanechnikov"),
main="Epanechnikov")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="quartic"),
main="Quartic")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="disc"),
main="Disc")
# Using bw.ppl() bw
par(mfrow=c(2,2))
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian"),
main="Gaussian")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="epanechnikov"),
main="Epanechnikov")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="quartic"),
main="Quartic")
plot(density(grab_origins_dt_ppp.km,
sigma=bw.ppl,
edge=TRUE,
kernel="disc"),
main="Disc")
4.2 Network Kernel Density Estimation (NKDE)
4.2.1 Visualizing the Geospatial data
tmap_mode('view')
tm_shape(sg_downtown_grab_sf) +
tm_dots() +
tm_shape(sg_downtown_road_sf) +
tm_lines()tmap_mode('plot')4.2.2 Preparing the lixel objects
lixels <- lixelize_lines(sg_downtown_road_sf,
700,
mindist = 350)4.2.3 Generating line centre points
samples <- lines_center(lixels)4.2.4 Performing NetKDE
densities <- nkde(sg_downtown_road_sf,
events = sg_downtown_grab_sf,
w = rep(1,nrow(sg_downtown_grab_sf)),
samples = samples,
kernel_name = "quartic",
bw = 300,
div= "bw",
method = "simple",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5, #we aggregate events within a 5m radius (faster calculation)
sparse = TRUE,
verbose = FALSE)samples$density <- densities
lixels$density <- densities
# rescaling to help the mapping
samples$density <- samples$density*1000
lixels$density <- lixels$density*1000tmap_mode('view')
tm_shape(lixels)+
tm_lines(col="density") +
tm_shape(sg_downtown_grab_sf) +
tm_dots()tmap_mode('plot')